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Abstract 

In plant and animal breeding studies a distinction is made between the 
commercial value (additive + epistatic genetic effects) and the breeding 
value (additive genetic effects) of an individual since it is expected that 
some of the epistatic genetic effects will be lost due to recombination. 
In this paper, we argue that the breeder can take advantage of some of 
the epistatic marker effects in regions of low recombination. The models 
introduced here aim to estimate local epistatic line heritability by using 
the genetic map information and combine the local additive and epistatic 
effects. To this end, we have used semi-parametric mixed models with 
multiple local genomic relationship matrices with hierarchical testing de- 
signs and lasso post-processing for sparsity in the final model and speed. 

Keywords & Phrases: Genomic selection, Genome wide association. Plant / 
animal breeding, Mixed model. Multiple kernel learning, Heritability 

1 Introduction 

Selection in animal or plant breeding is usually based on estimates of genetic 
breeding values (GEBV) obtained with semi- parametric mixed models (SPMM). 
In these mixed models genetic information in the form of a pedigree or markers 
are used to construct an additive kernel matrix that describes the similarity of 
line specific additive genetic effects. These models have been successfully used 
for predicting the breeding values in plants and animals. The studies show that 
using similarities calculated from sufficient genome wide marker information 
almost always lead to better prediction models for the breeding values compared 
to the pedigree based models. In both simulation studies and in empirical 
studies of dairy cattle, mice and in bi-parental populations of maize, barley and 
Arabidopsis marker based SPMM GEBVs have been remarkably accurate. 
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A SPMM for the n x 1 response vector y is expressed as 



y = Xp + Zg + e 



(1) 



where X is the n x p design matrix for the fixed effects, /3 is a p x 1 vector of 
fixed effects coefficients, Z is the nx q design matrix for the random effects; the 
random effects {g' , e')' are assumed to follow a multivariate normal distribution 
with mean and covariance 



The kernel of the marker based SPMM's and reproducing kernel Hilbert 
spaces (RKHS) regression models have been stressed recently ([9J). In fact, the 
connection have been recognized long time ago by [T7], [15], [37], [32]. RKHS 
regression models use an implicit or explicit mapping of the input data into 
a high dimensional feature space defined by a kernel function. This is often 
referred to as the "kernel trick" ([29]). It is possible to say that RKHS regression 
extends SPMM's by allowing a wide variety of kernel matrices, not necessarily 
additive, calculated using a variety of kernel functions. The common choices 
for kernel functions are the linear kernel function, polynomial kernel function, 
Gaussian kernel function though many other options are available. 

A kernel function, /(;(., .) maps a pair of input points x and x' into real 
numbers. A kernel function is by definition symmetric {k{x,x') = k{x',x)) 
and non-negative. Given the inputs for the n individuals we can compute a 
kernel matrix K whose entries are Kij = k{xi,Xj). The linear kernel function 
is given by k{x; y) — x'y. The polynomial kernel function is given by k{x; y) = 
[x'y + cY for c and d ^ R. Finally, the Gaussian kernel function is given by 
k{x;y) = exp{—{x' — y)'{x' — y)/h) where h > 0. Taylor expansions of these 
kernel functions reveal that each of these kernels correspond to a different feature 
map. 

For the marker based SPMM's, a genetic kernel matrix calculated using a 
linear kernel matrix incorporates only additive effects of markers. A genetic 
kernel matrix based on the polynomial kernel of order k incorporates all of the 
one to k order monomials of markers in an additive fashion. The Gaussian kernel 
function allows us to implicitly incorporate the additive and complex epistatic 
effects of the markers. 

Simulation studies and results from empirical experiments show that the 
prediction accuracies of models with Gaussian or polynomial kernel are usually 
better than the models with linear kernel. However, it is not possible to know 
how much of the increase in accuracy can be transfered into better generations 
because some of the predicted epistatic effects that will be lost by recombination. 
This issue touches the difference between the commercial value of a line which is 
defined as the overall genetic effect (additive+epistatic) and the breeding value 
which is the potential for being a good parent (additive) and it can be argued 
that linear kernel model estimates the breeding value where as the Gaussian 




where K is a q x q kernel matrix. 
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kernel estimates the commercial value. In this article, we argue that the breeder 
can take advantage of some of the epistatic marker effects in regions of low 
recombination. The models introduced here aim to estimate local epistatic line 
heritability by using the genetic map information and combine the local main 
and epistatic effects. Since the epistatic effects that are incorporated are only 
local there is little chance that these effects will disapear with recombination. 
Heritability is defined as the percentage of total variation that can be explained 
by the genotypic component. One can similarly argue that SPMM's with linear 
kernels produce estimates of narrow sense line heritability, and the SPMM's 
with Gaussian Kernel produces estimates of broad sense line heritability. We 
expect that the estimates of local heritability developed in this paper to be 
between narrow and broad sense heritability. 

An issue with the over the shelf kernel functions like linear or Gaussian 
kernels is that same kernel matrix is used no matter what trait is considered 
and that all markers are assigned equal weighs in the analysis. When the relation 
matrix is not task-specific, it is often one that forces the solution to be overly 
smooth. 

We propose several approaches for local kernel matrix calculation. Our final 
models are SPMM's with semi-supervised kernel matrix that is obtained as a 
function of many local kernels. They differ mainly in the way the local kernel 
matrices and their weights are calculated. One major aim of this article is to 
measure and incorporate additive and local epistatic genetic contributions since 
we believe that the local epistatic effects are relevant to the breeder. 

The local heritability models in this article can be adjusted so that genetic 
contribution of the whole genome, the chromosomes, or local regions can be 
obtained. In the following sections, we will discuss several ways in which this 
information can be useful to the breeder and we will illustrate a breeding scheme 
where the local instead of genome wide effects are utilized. 

2 Multiple kernel learning with SPMM 

In recent years, several methods have been proposed to combine multiple kernel 
matrices instead of using a single one. These kernel matrices may correspond to 
using different notions of similarity or may be using information coming from 
multiple sources. For example, genomic kernel -I- pedigree kernel, chromosome 
model, linear mixed models with linear covariance structure. A good review 
and taxanomy of multiple kernel learning algorithms in the machine learning 
literature can be found in [T^ . 

Multiple kernel learning methods use multiple kernels by combining them 
into a single one via a combination function. The most commonly used combi- 
nation function is linear. Given kernels Ki, K2, . . . , Kp, a linear kernel is of the 
form 

K = T]lKl + T]2K2 + . . . + T]pKp. 

The kernel weights 771 , 772 , . . . , T?p are usually assumed to be positive and this cor- 
responds to calculating a kernel in the combined feature spaces of the individual 
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kernels. We will also assume that the weights sum to one. 

The components of K are usually input variables from different sources or 
different kernels calculated from same input variables. The kernel K can also 
include interaction components like Ki © Kj , Ki (g) Kj , or perhaps — {Kt — Kj)Q 
{Kj — Ki). For example, if Ke is the environment kernel matrix and Kq is the 
genetic kernel matrix, then a component Ke Kq can be used to capture the 
gene by environment interaction effects. 

The mixed models in [5] use A Q A to capture interaction effects. The 
reasoning comes from (Falconer and Mackay, 1996): 

"If one assumes no dominance, all terms will vanish except the terms for 
additive and additive x additive variances, which will take the form Ci'i = 
2fi>ial + (2/i'i)^CT^^ where fi'i is the COP between individuals i' and i, cr^ is the 
additive genetic variance, and aaa is the additive x additive genetic variance. 
Assuming linkage and identity equilibrium, it seems justified to use (2/;/^)^, 
which in matrix notation can be represented hy (AqA) = A as the coefficient of 
the additive x additive component" (where is the element-wise multiplication 
operator)." 

Although some multiple kernel approaches use fixed weights for combining 
kernels, in most cases the weight parameters need to be learned from the training 
data. Some principled techniques used to estimate these parameters include 
likelihood based approaches in the mixed modeling framework like Fisher scoring 
algorithm or variance least squares approach though these approaches are more 
suitable to cases where only a few kernels are being used. 

[24] propose two simple heuristics to select kernel weights in regression prob- 
lems: 



l^h=l ' h 

and 



where is the Pearson correlation coefficient between true response and the 
predicted response and Mm is the mean square error generated by the regression 
using the kernel matrix Km alone. Another approach in |24j uses the kernel 
alignment: 

_ A{Km,yy') 

""^-El^.AiK.^yy') 
where kernel alignment is calculated using 

{Ki,K2)e 



A{K,,K2 
and 



^{K,,K^)p{K2,K2), 



N 

5](i^l),,(if2). 
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2.1 Multiple kernel SPMM models 

In the context of the SPMM's we propose using weights that are proportional 
to the estimated variances attributed to the kernels. One possible approach is 
to use a SPMM with multiple kernels in the form of 

V = Xfi + Zig, + Z2fif2 + • • • + ZkOk + e (2) 

where ~ Nq^ (0, ag_Kj) for j = 1, 2, . . . , fc. Let a^. for j = 1, 2, . . . , fc and 
be the estimated variance components. Under this model calculate heritabilities 
as h^n = ^l^/(J2Li + ^e) for TO = 1, 2, . . . , fc. 

Another model incorporates the marginal variance contribution for each ker- 
nel matrix. For this we use the following SPMM: 

y = Xp + Z,g^ + Z^,g_^ + e (3) 

where Qj ^ Nq^{Q,cr'^.Kj) for j = l,2,...,fc. is the random effect corre- 
sponding to the input components other than the ones in group j. In this case 
calculate heritabilities as = d-g^/{(Tg^ + <5'g_„ + <^ern) ^^r m = 1, 2, . . . , fc. 

A simpler approach is to use a separate SPMM for each kernel. Let 
and (Tg^ be the estimated variance components from the SPMM model in ((T]) 
with kernel K — K,n- Let /i^ = o^1^/{(^1^ + ^1^)- Note that, in this case, 
the markers corresponding to the random effect g_j which mainly accounts for 
the sample structure can now be incorporated by a fixed effects via their first 
principal components. 

After heritabilities are obtained, calculate the kernel weights ?7i , 772 , . . . , r/p 

as 

Z^/i=i 'hi 

The estimates of parameters for models in ([T]), ([2]) and ([3]) can be by maximiz- 
ing the likelihood or the restricted (or residual, or reduced) maximum likelihood 
(REML). There are very fast algorithms devised for estimating the parameters 
of the single kernel model in ([1]). However, an advantage with the multiple ker- 
nel approach in models ([2]) and ([3]) is that they can be used for testing nested 
models through the likelihood ratio test. Estimating the parameters of Model 
([2|) gets very difficult with large number of kernels and with large sample sizes, 
the single kernel or the marginal kernel models are more suitable in such cases. 

2.2 An ensemble model using lasso 

2.3 Kernels for genomic variables 

The simplest way we can obtain local kernel matrices is by defining regions 
in the genome and calculating a separate kernel matrix for each group and 
region. The regions can be overlapping or discrete. If the some markers are 
associated with each other in terms of linkage or function it might be useful 
to combine them together. The whole genome can be divided physically into 
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chromosomes, chromosome arms or hnkage groups. Further divisions could be 
based on recombination hot-spots, or just merely based on local proximity. We 
could calculate a separate kernel for introns and exons, non coding, promoter or 
repressor sequences. We can also use a grouping of markers based on their effects 
on low level traits like lipids, metabolites, gene expressions, or based on their 
allele frequencies. When some markers are missing for some individuals, we can 
calculate a kernel for the presence and absence states for these markers. When 
no such guide is present one can use a hierarchical clustering of the variables. 
It is even possible to incorporate group memberships probabilities for markers 
so the markers have varying weights in different groups. 

The second approach which we refer to as kernel scanning requires a linkage 
map of the markers. This approach is similar to the one in [13] where the 
chromosomes are scanned with windows of 5 consecutive markers. Let M be 
the q X p matrix of p markers on q lines, which is partitioned with respect 
to the chromosomes as (Mi,M2, . . . , Mc) where Mk has pk columns. Let the 
cumulative distances based on LD between markers in each chromosome be 
provided in a vector for k = 1,2, ... ,c. Based on pj. we can to obtain a kernel 
matrix for markers in each chromosome using a kernel function and by combining 
these chromosome specific kernel matrices in block diagonal form we obtain a 
pxp kernel matrix S for markers. Let the k column of this matrix be represented 
as Sfe- A local kernel matrix Kk at position k involves using diag{sky/^M in 
kernel matrix calculations. Kernel scanning approach involves calculation of a 
kernel matrix for selected marker across the genome at each marker location. By 
adjusting the kernel width parameter, we are able to determine the smoothness 
and locality of these kernel matrices. 

One argument for why we would like to focus on short segments of the 
genome as distinct structures comes from the "building blocks" hypothesis in 
the evolutionary theory. The schema theorem of Holland predicts that a com- 
plex system which uses evolutionary mechanisms such as fitness, recombination 
and mutation tend to generate short and well fit structures, these basic struc- 
tures serve as building blocks. For example, when the alleles associated to an 
important fitness trait are scattered all around the genome the favorable effects 
can easily be lost just by independent segregation, therefore inversions that 
clump these alleles together physically would be strongly selected for. 

When the number of markers in a region is less than the number of indi- 
viduals in the training set the kernel matrix for this region becomes singular 
or ill conditioned. In these cases we can use shrinkage approaches to obtain 
well conditioned positive definite kernel. The shrinkage estimators of [2S] and 
[T8] that were advocated in [33] and [7] which involves shrinkage towards the 
identity matrix are not suitable to use with the SPMM since this involves al- 
locating a fixed proportion of the error variance to the variance of the random 
effects. We instead propose and use shrinkage estimators which aim to intro- 
duce sparsity in the off diagonal elements of the kernel matrix. Many algorithms 
have been devised for learning sparse covariance matrices in the recent years. 
A penalized maximum likelihood estimation was developed in |8], a penalized 
regression method was used in j21j . These and some other sparse covariance 
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estimation techniques are implemented in an R package "huge" 

In addition to possible decrease of computational burden by use of sparse 
matrix methods in mixed model parameter estimation, we can produce graph- 
ical representations of the kernel matrices. For a normally distributed random 
vector, the independence between two components is implied by zero covari- 
ance between the components, more interestingly, the conditional independence 
between two components is implied by the zero components in the inverse co- 
variance matrix. In Figure [1] we display the graphical representation of the 
realized relationship matrix. 

lambda = 0.28 




Figure 1: Barley CAP. The different plants are represented by the dots and 
the nonzero relationship coefficients are represented by the lines between them. 
The graphical representation of the shrunk relationship matrix gives us an idea 
about the structure of the Barley CAP population. 

2.4 Hierarchical testing for sparsity and speed 

Although somewhat different in all of the above models, the kernel weights 
ryi , 772 , . . . , ?7p can be interpreted as the contribution components to the response 
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variable. In the context of Model (1) certain tests are devised for testing whether 
CTg is zero against the one sided alternative. 

Under Model (1), twice the log-likelihood of y given the parameters /3, Cg 
and A = cTg/fTg is, up to a constant, 

where V\ = /„ + XZKZ' and n is the size of the vector y. 

Twice the residual log-likelihood ([26]j [H]) is, up to a constant. 



Rm,alX) = -{n-p)\ogal-\og\V,\-\og{X'V,X)- ^y Xp,yV-\y X^) 



(6) 

From both of these likelihoods, a test statistic for the significance of the 
variance component 

Ho : A-0 (ct^=0) 
Ha ■■ A > (a^ > 0) 

can be obtained by calculating the likelihood ratio statistic (f5]). 

Under standard regularity conditions the null distribution of the likelihood 
ratio test statistic has a distribution with degrees of freedom given by the 
difference in the number of parameters between the null and alternative hy- 
pothesis. However, showed that the asymptotic distribution is a weighted 
mixture of distributions. For the SPMM in (1) they have recommended 
using a equally weighted mixture of x^(0) distribution where x^(0) 

distribution refers to a distribution degenerate at 0. In simulation studies [23], 
[22] found that equal contributions work well with residual log-likelihood where 
as a 0.65 to 0.35 mixture works better for the log-likelihood. A finite sample 
null distribution was recommended for the SPMM in (1) in [6] and it was shown 
that the mixture proportions depended on the kernel matrix. 

In many practical cases a thresholding method can be sufficient for the pur- 
poses of identifying regions that contribute to phenotype variation. Relevant 
regions are divided further subregions and the procedure is repeated to a de- 
sired detail level. Nevertheless, suitable hierarchical testing procedures have 
been developed. [5] proposed and analyzed several hierarchical designs in terms 
of their cost / power properties. Multiple testing procedures where coarse to 
fine hypotheses are tested sequentially have been proposed to control the family 
wise error rate or false discovery rate ([25], [20]). These procedures can be used 
along the "keep rejecting until first acceptance" scheme to test hypotheses in 
an hierarchy. 

Meinshausen's hierarchical testing procedure controls the family wise error 
by adjusting the significance levels of single tests in the hierarchy. The procedure 
starts testing the root node H^ at level a. When a parent hypothesis is rejected 
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one continues with testing all the child nodes of that parent. The significance 
level to be used at each node H is adjusted by a factor proportional to the 
number of variables in that node: 



where |.| denotes the cardinality of a set. This means that larger penalty is 
incurred at finer levels. The inheritence procedure in provides a uniform 
improvement over the method by Meinshausen. An hypothetical hierarchical 
test is displayed in Figure [2] 




Figure 2: An hypothetical hierarchical test set up for an organism with 3 chro- 
mosomes. The first test is at the whole genome level. It continues by testing 
the significance of each chromosome and regions of the chromosome. 
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In most GWAS studies the focus is on estimating the effects of individual 
markers and lower level interactions. However, in the genomic era, the number of 
SNP markers can easily reach millions and the methods used in GWAS for large 
samples become compiitationally exhaustive. The local kernel approach devel- 
oped in this article remedies this problem by reducing the number of hypothesis 
by focusing on regions and testing the nested hypothesis in an hierarchy. 

2.5 A multiple kernel model with lasso penalty for spar- 

sity 

Although we can include sparsity in our multiple kernel model by use of hier- 
archical testing procedures described in the previous section, we can also ac- 
complish this by means of a general additive model with lasso penalty post- 
processing formulation. 

Each multiple local kernel SPMM model discussed in previous section can 
be utilized to obtain EBLUPs from the specific regions. Let x be the p vector 
of fixed effects and m = (mi, m2, . . . , mk) be the vector of markers partitioned 
into k regions. Let gj{m) denote the EBLUPs of random effect components 
that correspond to the k local kernels for regions j = 1,2, ... ,k and individual 
with markers m. Consider a final prediction model in the following form: 

k k+p 

F{x,Tn,l3) = l3o + ^Pjgj{m)+ ^ /3jXj. (7) 

j=l j=k+l 

Estimate the model coefficients using the following loss function 

N k k+p k 

^ t=l j=l 3 = k+l j=l 

A > is the shrinkage operator, larger values of A decreases the number of 
models included in the final prediction model. 

When k is large compared to the sample size N, we should use the following 
loss function 

N k k+p k k 

/3 = argmin^(y,-(/3o+^/3jff,(mO+ ^ l3,Xji)f + XiJ2\l3j\+X2j2Wjf 

^ i=l j=l j=k+l 3 = 1 3=1 

(9) 

to allow for more than N non zero coefficients in the final estimation model. 
Ai, A2 > are the shrinkage operators. 

The authors are aware that there are other methods which can introduce 
shrinkage in the parameters like subset selection, partial least squares, principal 
components regressions, Bayesian lasso, etc... But in essence all these algorithms 
should give similar results. 
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3 Illustrations 



In this section, we will compare the methods introduced in this article to some 
of the existing ones. Our example uses simulated markers and phenotypes and 
so the truth is known. We can use our model to predict the breeding value of 
new genotypes and, in addition, by using the weights or the likelihood ratio test 
statistic for Model (1) for each of the kernels for different regions we can evaluate 
the contribution of each region to the total genetic variance. Instead of finding 
markers that point to the location of the quantitative trait loci, our approach 
tries to determine the local effects. We use the Gaussian kernel through out our 
experiments. 

Example 3.1. The data in this example was generated by a whole genome sim- 
ulator "hypred" fT^ which is an R package that simulates high density genomic 
data. Markers for each of the 7 chromosomes of length IM are simulated for 
individuals which were produced randomly mating two founder lines for 20 gen- 
erations. The total number of markers was 3000. On each chromosome 20 QTL 
positions and additive effects were randomly generated. Residual variance was 
set to adjust the heritability of the trait to 0. 75. All marker effects are additive. 
The number of individuals is set to 1000. 750 of these individuals were ran- 
domly chosen to the training set and remaining to the test data set. In Figure 
(3 we compare the accuracies in the test data using the correlation scores between 
the observed and predicted trait values for the linear and Gaussian kernel mod- 
els with the multiple kernel models that divide the genome into pieces differing 
number of regions. Number of regions were set to 2^ = 4, 3^ = 9,..., 15^ = 225 
(two levels of hierarchy) . The experiment was repeated 30 times. 

In Figure local heritabilities for 30 non overlapping sections are displayed 
for one realization of this experiment. In Figure we displayed the local her- 
itability values from the kernel scanning approach. Finally in Figure 11 , we 
display the results from a hierarchical testing procedure. 

While we may have confidence that GS can accelerate short-term gain, no 
such confidence is justified for long-term gain. Beyond the first cycles of selec- 
tion, mechanisms the effects of which are difficult to predict analytically begin 
to operate. In [10] and [16] a weighted GS model was used so that markers for 
which the favorable allele had a low frequency should be weighted more heavily 
to avoid losing such alleles. In this article we recommend using the similar ap- 
proach which aims to conserve rare but favorable alleles for balancing short-term 
and long-term gains from GS. Let gji for regions j = 1, 2, . . . , fc and individuals 
i = l,2,...,A^be the EBLUPs of random effect components that correspond to 
the k local kernels and let pji denote the estimated density value of the alleles 
in region j for individual i. A selection criterion in the spirit of |16j for the ith 
individual is then 



However, this criterion also down weights favorable but common alleles. 



k 




(10) 
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Figure 3: Cross validated accuracies measured in terms of correlation scores for 
simulated data described in Example 1. 



A better approach is using a weighting scheme that can up weight both rare 
and common favorable alleles based on breeders preferences. Let be the 

maximum EBLUP value among the individuals for region j and let rjj be the 
weight of the kernel j. The selection criterion 



Vj 



^ 2irhisd{gj)h2sd{pj) 



exp{- 



{9ji-9jin)? 

h\var{gj) h^var^pj) 



+ 



} (11) 



where the constants h\ and /12 are selected by the breeder based on preferences 

given to short term and long term gains correspondingly. 

More importantly the estimates of local heritabilities and the estimates for 
the local effects for the lines in the breeding population can be used to ap- 
proximate the distribution of the; phcinotype for th(^ individual crosses. This 
information is as important to tlie breeder as a good prediction model since it 
gives a guide for action. 

Let Qi ~ {9ji)j=i and §2 — (,9j2)j=i be the EBLUP values of two individuals 
for the k regions of the genome. A typical cross between two double haploid 
individuals will have a breeding value in the form 



[0,l,0,0,...,0,l]'gi + [l,0,l,l,...,l,0]'g2 



(12) 



since each region is inherited from either of the parents. If we can assume 
the assortment is independent between each region and with equal chance an 
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Figure 4: Local kernel weights for non overlapping sections 
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Figure 5: The figure on the left illustrates the sparsity pattern obtained using 
the hierarchical testing procedure for one instance of the experiment described 
in Example 1 We have used the hierarchical testing procedure of Meiwaussen 
to identify the relevant regions of the genome for the data in Example 1. Only 
the regions with significant effects were used to build the multiple kernel model 
(regions with value are not included in the final model). This model (M) is 
compared with the SPMM's with linear and Gaussian kernels (Lin and Gauss) 
with the boxplots on the right. All the models have approximately the same 
prediction ability, but the multiple kernel model is definitely more parsimonious 
(parts of genome included in the final prediction model are indicated by ones 
on the left graph). 
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approximation of the distribution of the phenotype for the off springs of these 
plants can be obtained by Markov chain Monte Carlo techniques. The law of 
independent assortment always holds true for genes that are located on different 
chromosomes, but for genes that are on the same chromosome, it does not always 
hold true. 

If the regions of interest are in linkage then the mapping information can bu 
used to generate the sequences of zeros and ones using the multivariate bino- 
mial logit-normal (MVBLN) distribution [3] which incorporates the dependency 
between the regions. The MVBLN model in [3] can be written for our purposes 
of idealizing the regions r = {rj}'-^^ as 

logit{p) '^ Nkifi,^) 

and 

rj\pj - bin{l,pj). 

A^fc(/x, S) is the k dimensional multivariate normal model with mean fi and 
covariance E = {<Tjji). hin{n,p) is the binomial distribution with sample size n 
and probability of success p. The logit{p) denotes the logits (log(p/(l — p))) of 
the individual elements of p. Then, using the results in 

E{r,) ^ E{E{r,\p,)} ^ E{p,) 

and 

cov{rj , rj, ) = cov{pj , pj, ) + I{j = j')Epj{l -pj). 

Keeping only the first term of the Taylor series and letting p* — (e'^/(l + e**)) 
where the exponential operation is meant to work element wise, 

E{r) wp*. 

Since we want E{r) — ^1, we should set fi — 0. Similarly, for j ^ j', 

We should set Xjji according to our expectations about recombination between 
two regions. 

Several of such crosses are compared with respect to their estimated phe- 
notypic distribution in Figure |9l As one would expect the variance of the dis- 
tributions from crosses increase as the plants become complimentary to each 
other. As a selection strategy we recommend selecting the crosses with high 3rd 
quartile for a trait where high values are preferable. 

4 Conclusions 

The approaches introduced allows us to use the input variables in naturally 
occurring blocks. In the context of the SPMM in (1) there are very fast al- 
gorithms that can take advantage of this dimension reduction. For the linear 
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kernel function, the order of calculations to solve a SPMM with one kernel ma- 
trix is proportional to min{n, m) where m here is the number of features in that 
kernel. No matter what the input dimension is SPMM parameter estimation 
involves matrices of order n. Therefore, the multiple kernel approach overcomes 
the memory problems that we might incur when the number of markers is very 
large. 

The local kernels use information collected over a region in the genome and, 
because of linkage, will not be effected by a few missing or erroneous data points, 
so this approach is also robust to missing data and outliers. 

For QTL identification, we have recommended a nested sequential approach 
that levels of views of the genome which might lead to faster exploration of the 
whole genome for quantitative trait loci. Recently popular deep learning algo- 
rithms try to learn levels of attributes and hypothesis [1] . Also active learning 
algorithms search for the parts of the data to obtain information in a step- 
wise fashion [21] . " Testing along a tree of hypotheses" approach to association 
studies is related to these main stream methods of machine learning. 

Sexual gene transfer methods have been used successfully in plant breed- 
ing for thousands of years. More recently, nonsexual methods have also been 
incorporated. The multiple kernel mixed model approach allow us to evaluate 
the utility of genome regions. This, in turn, allow us to build models with 
good prediction accuracy and, more importantly, aides us in our action: Which 
plants, chromosomes or genome regions should be kept in our breeding program? 
Which crosses are most useful? Which genome regions should be transfered be- 
tween individuals? Success with genomic selection partially depends on good 
prediction models and partially on utilization of this kind of information and 
new emerging technologies. 

Although we have focused our attention to classical breeding with crosses 
among selected parents, a short cut to breeding better plants and animals is 
possible by isolation and fusion of individual chromosomes or genomic regions. 
This type of breeding, which we call chromosomal breeding, involves to breeding 
better chromosomes and combining them. The authors believe that the plants 
obtained by passing chromosomes within species or families involve minimal 
genetic manipulation. 
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Figure 6: The local heritability values from the kernel scanning approach. The 
true QTL and effect sizes are superimposed on the estimated regional effects 
(Horizontal bars for true effects, colored points for estimated effects in chromo- 
somes) . 
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Kernel scanning 



3 380 709 33 31 39 8 92 70 73 49 4 790 94 46 93 419 5 347 69 19 84 1 423 54 



Kernel scanning with density weighting 



3 550 214 889 541 129 292 143 964 95 65 270 66 118 614 36 916 60 67 1 311 226 



Kernel scanning (best 50) 
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304 769 226 954 792 727 777 663 146 285 127 200 535 828 54 945 800 



Kernel scanning with density weighting (best 50) 



ml 



186 411 226 117 285 957 849 663 194 828 610 673 773 671 764 341 800 



Figure 7: Individuals sorted based on their weighted blups and regular blups. 
About 30 of the best 50 individuals were common for both approaches. 
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Figure 9: Approximate distribution of the phenotype of several crosses. The 
vertical lines in the histogram denote the EBV of parents. In this toy example, 
cross 5 is expected to produce progenies with higher trait values. 
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